2024-11-05
Research computing is the collection of computing, software, storage resources and services that allows for data analysis at scale.
In our particular case we are interested in leverging research computing to augment stock assessment worflows.
Run more/bigger models in less time
Improve efficiency by running 10s - 1000s of models ‘simultaneously’.
2021 Southwest Pacific Ocean swordfish stock assessment
9,300 model runs totalling ~46 months of computation time.
Software containers
Better science
High-throughput computing (HTC)
High-performance computing (HPC)
2024 North Pacific shortfin mako shark assessment: Used HTC resources to complete ~4 months months of computations (18,000 simulation-estimation model runs) in ~3 hours (1027x faster) during working group meeting.
Example: Fitting large spatiotemporal model in R using TMB required 128 CPUs & 1TB RAM.
High-throughput computing (HTC)
High-performance computing (HPC)
Photo credit: NOAA
OpenScienceGrid (OSG): OSPool
OpenScienceGrid (OSG)
NOAA Hera
Both use software containers
Many may already be using containers such as GitHub Codespaces or Posit Workbench in existing cloud-based workflows
Let’s look at an example (linux-r4ss.def):
Bootstrap: docker
From: ubuntu:20.04
%post
TZ=Etc/UTC && \
ln -snf /usr/share/zoneinfo/$TZ /etc/localtime && \
echo $TZ > /etc/timezone
apt update -y
apt install -y \
tzdata \
curl \
dos2unix
apt-get update -y
apt-get install -y \
build-essential \
cmake \
g++ \
libssl-dev \
libssh2-1-dev \
libcurl4-openssl-dev \
libfontconfig1-dev \
libxml2-dev \
libgit2-dev \
wget \
tar \
coreutils \
gzip \
findutils \
sed \
gdebi-core \
locales \
nano
locale-gen en_US.UTF-8
export R_VERSION=4.3.1
curl -O https://cdn.rstudio.com/r/ubuntu-2004/pkgs/r-${R_VERSION}_1_amd64.deb
gdebi -n r-${R_VERSION}_1_amd64.deb
ln -s /opt/R/${R_VERSION}/bin/R /usr/local/bin/R
ln -s /opt/R/${R_VERSION}/bin/Rscript /usr/local/bin/Rscript
R -e "install.packages('remotes', dependencies=TRUE, repos='http://cran.rstudio.com/')"
R -e "install.packages('data.table', dependencies=TRUE, repos='http://cran.rstudio.com/')"
R -e "install.packages('magrittr', dependencies=TRUE, repos='http://cran.rstudio.com/')"
R -e "install.packages('mvtnorm', dependencies=TRUE, repos='http://cran.rstudio.com/')"
R -e "remotes::install_github('r4ss/r4ss')"
R -e "remotes::install_github('PIFSCstockassessments/ss3diags')"
NOW=`date`
echo 'export build_date=$NOW' >> $SINGULARITY_ENVIRONMENT
mkdir -p /ss_exe
curl -L -o /ss_exe/ss_linux https://github.com/nmfs-stock-synthesis/stock-synthesis/releases/download/v3.30.21/ss_linux
chmod 755 /ss_exe/ss_linux
%environment
export PATH=/ss_exe:$PATH
%labels
Author nicholas.ducharme-barth@noaa.gov
Version v0.0.3
%help
This is a Linux (Ubuntu 20.04) container containing Stock Synthesis (version 3.30.21), R (version 4.3.1) and the R packages r4ss, ss3diags, data.table, magrittr, and mvtnorm.National Stock Assessment Science Seminar